This document examines a variety of basic state level controls for the manufacturing economy in each state.

Dimensions of Interest

We seek to develop a categorization of manufacturing economies along a few dimensions of interest:

  1. Relative share of manufacturing in general (in both employment and GDP)
  2. Relative trends in the strength of manufacturing (in both employment and GDP)
  3. Diversity of manufacturing (number of 3 digit manufacturing NAICS code with high location quotients) (Porter; Delgado; Stern)
  4. Geography of manufacturing (number of counties with 2 digit manufacturing location quotient) (Hidalgo)
  5. Ratio of small manufacturers to large manufacturers (Bernard, etc.)

We will rely on QCEW and BEA data here to construct these. We are interested in state trends, relative to national trends, and only make use of county data for the third proposed categorization approach described above (geography of manufacturing).

We focus on a five year period from 2017-2021 to establish the current status of state manufacturing ecosystems, but may need to bring in more years. In addition, we use 2010 as a “baseline” year, to examine recent longer-term trends in state manufacturing economies.

From previous work, we have some previously downloaded data that can be used for both point 1 and 3 above.

Crosswalks

Aggregation Level Crosswalk

# agglvl_dict <- read.csv("https://www.bls.gov/cew/classifications/aggregation/agg-level-titles-csv.csv", header = TRUE)

agglvl_dict <- read.csv(here("State Data/agg-level-titles-csv.csv"), header = TRUE)

Area Cross-Walk

area_codes <- readRDS(here("State Data/area_crosswalk.RDS")) %>% 
  mutate(area_title = str_remove(area_title, " -- Statewide")) %>% 
  mutate(area_fips = case_when(
    nchar(area_fips) == 4 ~ paste("0", area_fips, sep = ""), 
    TRUE ~ area_fips
  ))
states <- data.frame(state_abbr = state.abb, area_title = state.name)
area_codes <- left_join(area_codes, states)
## Joining with `by = join_by(area_title)`

Import Employment Data

We start by importing some previously downloaded data and cross-walks that will be useful throughout the rest of our work.

qcew_data <- readRDS(here("State Data/qcew_data.RDS")) %>% 
  filter(industry_code %in% c("10", "31-33"))

We clean variable names and select specific variables of interest.

qcew_data <- left_join(qcew_data, agglvl_dict) %>% 
  select(area_fips, industry_code, agglvl_code, agglvl_title, disclosure_code, 
         estabs = annual_avg_estabs, emp = annual_avg_emplvl, earnings = avg_annual_pay, estabs_change = oty_annual_avg_estabs_pct_chg, emp_change = oty_annual_avg_emplvl_pct_chg, earnings_change = oty_total_annual_wages_pct_chg, year, 
         lq_estabs = lq_annual_avg_estabs, lq_emp = lq_annual_avg_emplvl, lq_wages = lq_total_annual_wages, lq_pay = lq_avg_annual_pay)
## Joining with `by = join_by(agglvl_code)`

1) Manufacturing Economies

We seek to establish a baseline measure for state manufacturing economies, which can capture their overall and relative size and trends over time. The manufacturing economy plays a unique role, and varies across industrial sectors as well as geographies. State manufacturing economies consist of unique sets of manufacturing firms, and surrounding institutional and economic environments. Existing theories tells us that particular manufacturing sectors are stronger than others, and describe how industrial clusters can promote efficiency.

As such, we seek to explore data on the size, scope, and share of state manufacturing economies over time. When necessary, we may aggregate to the county and 3-digit level for fine-grain data, but we primarily seek state-level measures of the manufacturing economy.

These economic measures might still be confounded by political pressure, attitudes towards manufacturing in the state, and state permitting regulations. (NOTE: in addition, we may want to measure the number of available greenfield/brownfield factors, as well as utility rates in the state, in addition to labor)

We start by focusing on state level trends. We have employment as well as GDP data at the state and national levels. Here, our goal is to benchmark state performance against the national average to contextualize changes in state manufacturing economies.

We first focus on employment data.

qcew_state <- qcew_data %>% 
  filter(str_detect(agglvl_title, "County|MSA", negate = TRUE))
qcew_state %>% 
  group_by(agglvl_title, industry_code)%>%
  count() 
## # A tibble: 4 × 3
## # Groups:   agglvl_title, industry_code [4]
##   agglvl_title                                  industry_code     n
##   <chr>                                         <chr>         <int>
## 1 National, NAICS Sector -- by ownership sector 31-33             5
## 2 National, Total -- by ownership sector        10                5
## 3 State, NAICS Sector -- by ownership sector    31-33           265
## 4 State, Total -- by ownership sector           10              265

From the data that we downloaded earlier, we see that we have five years of information about both total establishments and employment (Industry Code 10), as well as manufacturing establishments and employment (Industry Code 31-33).

We want to move Total Industry information to a separate column to better calculate the manufacturing share of the economy.

To do so, we create a function.

total_capture <- function(data) { 
  tot_data <- data %>% 
    filter(industry_code == "10") %>% 
    select(estabs, emp, earnings, estabs_change, emp_change, earnings_change, year, area_fips) 
  
  colnames(tot_data)[1:6] <- paste("tot", colnames(tot_data)[1:6], sep = "_")
  
  return(tot_data)
  }

Note that we will need to do this separately for national and state level data.

qcew_state <- qcew_data %>% 
  filter(str_detect(agglvl_title, "State"))

qcew_national <- qcew_data %>% 
  filter(str_detect(agglvl_title, "National"))
state_total <- qcew_state %>% 
  total_capture()

national_total <- qcew_national %>% 
  total_capture()

We then filter out these aggregate measures from our earlier dataframe, and add them as column-wise variables that vary by year.

qcew_state_clean <- qcew_state %>% 
  filter(industry_code != 10) %>% 
  left_join(state_total) 
## Joining with `by = join_by(area_fips, year)`
qcew_national_clean <- qcew_national %>% 
  filter(industry_code != 10) %>% 
  left_join(national_total) %>% 
  select(year, estabs, emp, earnings, estabs_change, emp_change, earnings_change, tot_estabs, tot_emp, tot_earnings) %>% 
  mutate(manf_share = estabs / tot_estabs, 
         manf_emp_share = emp / tot_emp)
## Joining with `by = join_by(area_fips, year)`
colnames(qcew_national_clean)[2:12] <- paste("usa", colnames(qcew_national_clean)[2:12], sep = "_")

We then calculate the manufacturing share of the state economy for each state, and each year in our data.

qcew_prelim <- qcew_state_clean %>% 
  select(area_fips, estabs, tot_estabs, emp, tot_emp, lq_estabs, lq_emp, lq_wages, year) %>% 
  mutate(manf_share = estabs / tot_estabs, 
         manf_emp_share = emp / tot_emp)
qcew_manf <- area_codes %>% 
  left_join(states) %>% 
  left_join(qcew_prelim, .) %>% 
  left_join(qcew_national_clean)
## Joining with `by = join_by(area_title, state_abbr)`
## Joining with `by = join_by(area_fips)`
## Joining with `by = join_by(year)`

To get 2010 data, we have to use a different previously constructed dataset.

qcew_10 <- readRDS(here("State Data/qcew_state.RDS")) %>% 
  filter(year == 2010) %>% 
  distinct()
qcew_national_10 <- read_csv(here("State Data/2010.annual US000 U.S. TOTAL.csv")) %>% 
  filter(industry_code %in% c("10", "31-33"), own_code == 5) %>% 
  select(area_fips, industry_code, estabs = annual_avg_estabs_count, emp = annual_avg_emplvl, earnings = avg_annual_pay, year)
## Rows: 4771 Columns: 43
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (11): area_fips, industry_code, qtr, disclosure_code, area_title, own_ti...
## dbl (32): own_code, agglvl_code, size_code, year, annual_avg_estabs_count, a...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
total_2010 <- function(data) { 
  tot_data <- data %>% 
    filter(industry_code == "10") %>% 
    select(estabs, emp, earnings, year, area_fips) 
  
  colnames(tot_data)[1:3] <- paste("tot", colnames(tot_data)[1:3], sep = "_")
  
  return(tot_data)
  }
qcew_nat_10 <- qcew_national_10 %>% 
  total_2010() %>% 
  left_join(qcew_national_10 %>% filter(industry_code != "10"), .) %>% 
  select(year, estabs, emp, earnings, tot_estabs, tot_emp, tot_earnings) %>% 
  mutate(manf_share = estabs / tot_estabs, 
         manf_emp_share = emp / tot_emp)
## Joining with `by = join_by(area_fips, year)`
colnames(qcew_nat_10)[2:9] <- paste("usa", colnames(qcew_nat_10)[2:9], sep = "_")
qcew_10_clean <- qcew_10 %>% 
  filter(industry_code == "31-33") %>% 
  select(area_fips, estabs, tot_estabs, emp, tot_emp, year) %>% 
  mutate(manf_share = estabs / tot_estabs, 
         manf_emp_share = emp / tot_emp) %>% 
  left_join(qcew_nat_10) 
## Joining with `by = join_by(year)`
qcew_10_maps <- area_codes %>% 
  left_join(states) %>% 
  left_join(qcew_10_clean, .)
## Joining with `by = join_by(area_title, state_abbr)`
## Joining with `by = join_by(area_fips)`
qcew_manf_all <- bind_rows(qcew_manf, qcew_10_maps)

Import GDP Data

gdp_data <- read_csv(here("State Data/state_gdp_all.csv"))
## Rows: 1265 Columns: 16
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (5): GeoFips, GeoName, Description, 2011, 2016
## dbl (11): LineCode, 2010, 2012, 2013, 2014, 2015, 2017, 2018, 2019, 2020, 2021
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

We start by focusing on two digit NAICS codes. We also create a dataframe of State GDP to benchmark against and return to in the future.

manf_gdp_share <- gdp_data %>% 
  filter(LineCode == 12 | LineCode == 2) %>% 
  mutate(across(.cols = -c(1:4), ~ as.numeric(.x)/lag(as.numeric(.x))), 
         area_fips = as.character(GeoFips),
         area_fips = case_when( 
           str_length(GeoFips) < 5 ~ paste("0", GeoFips, sep = ""),
           TRUE ~ area_fips)) %>% 
  filter(LineCode == 12)
state_gdp <- gdp_data %>% 
  filter(LineCode == 2) %>% 
  mutate(across(.cols = -c(1:4), ~ as.numeric(.x))) %>% 
  pivot_longer(cols = -c(1:4), names_to = "year", values_to = "state_gdp") %>% 
  mutate(year = as.numeric(year),
         area_fips = as.character(GeoFips),
         area_fips = case_when( 
           str_length(GeoFips) < 5 ~ paste("0", GeoFips, sep = ""),
           TRUE ~ area_fips))
state_gdp %>% saveRDS(here("State Data/state_gdp.RDS"))
manf_gdp <- gdp_data %>% 
  filter(LineCode == 12) %>% 
  mutate(across(.cols = -c(1:4), ~ as.numeric(.x))) %>% 
  pivot_longer(cols = -c(1:4), names_to = "year", values_to = "gdp_manf") %>% 
  mutate(area_fips = as.character(GeoFips), 
         year = as.numeric(year), 
    area_fips = case_when( 
           str_length(GeoFips) < 5 ~ paste("0", GeoFips, sep = ""),
           TRUE ~ area_fips))

We separate out national data.

gdp_nat <- state_gdp %>% 
  filter(GeoName == "United States") %>% 
  mutate(year = as.numeric(year))

manf_gdp_nat <- manf_gdp %>% 
  filter(GeoName == "United States") %>% 
  mutate(year = as.numeric(year))
gdp_sum <- gdp_nat %>%
  select(-c(LineCode, Description)) %>% 
  left_join(manf_gdp_nat) %>% 
  select(-c(LineCode, Description)) %>% 
  mutate(manf_share = gdp_manf/state_gdp, 
         manf_change = gdp_manf/lag(gdp_manf) - 1, 
         nat_change = state_gdp/lag(state_gdp) - 1) 
## Joining with `by = join_by(GeoFips, GeoName, year, area_fips)`
manf_point <- geom_point(size = 4, shape = 21, color = "black", fill = brewer.pal(9, "Set3")[5])
gdp_size_time <- gdp_sum %>% 
  ggplot(aes(x = year, gdp_manf, group = year)) +
  manf_point + 
  scale_y_continuous(labels = scales::unit_format(unit = "$B", scale = 1e-6)) + 
  theme_bw() + 
  labs(x = "Year", y = "GDP of the Manufacturing Sector, \n$2012 Dollars") + 
  axis_theme

gdp_size_time

total_point <- geom_point(size = 3, shape = 21, color = "black", fill = "#A5764D") 
gdp_time_share <- gdp_sum %>% 
  ggplot(aes(x = year, manf_share, group = year)) +
  total_point +
  scale_y_continuous(labels = scales::percent) + 
  theme_bw() + 
  labs(x = "Year", y = "National Manufacturing Share of GDP") + 
  axis_theme

gdp_time_share

We see that the manufacturing share of the economy declines each year between 2010 and 2020, but the overall size of the manufacturing economy grows.

nat_manf_trends <- gdp_time_share + gdp_size_time

Compared to the economy as a whole, the manufacturing sector has tended to

gdp_time_change <- gdp_sum %>% 
  ggplot() +
  geom_line(aes(x = year, nat_change), size = 2, color = brewer.pal(11, "Spectral")[11]) + 
  geom_line(aes(x = year, manf_change)) + 
  geom_point(aes(x = year, manf_change), size = 3) + 
  scale_y_continuous(labels = scales::percent) + 
  theme_bw() + 
  labs(x = "Year", y = "Annual Percent Change in GDP") + 
  axis_theme
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
gdp_time_change
## Warning: Removed 1 row containing missing values (`geom_line()`).
## Warning: Removed 1 row containing missing values (`geom_line()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).

In total size, the manufacturing economy grows more than 15% between 2010 and 2019, with an average yearly growth rate of 1.6%. This is slightly less than the average national growth rate of the economy, which increased by 24% between 2010 and 2019, with an average yearly growth rate of 2.42%.

We can also examine a 3-year moving average to try and smooth some of the year-to-year changes.

gdp_time_change_avg <- gdp_sum %>% 
  filter(year < 2020) %>% 
  mutate(across(.cols = c(manf_change, nat_change), ~(.x + lag(.x, 1) + lag(.x, 2))/3)) %>% 
  mutate(manf_change_avg = (manf_change + lag(manf_change, 1) + lag(manf_change, 2))/3) %>% 
  filter(year > 2012) %>% 
  ggplot() +
  geom_line(aes(x = year, manf_change)) + 
  geom_point(aes(x = year, manf_change), size = 3) + 
  geom_line(aes(x = year, nat_change), size = 2, color = brewer.pal(11, "Spectral")[11]) + 
  scale_y_continuous(labels = scales::percent) + 
  theme_bw() + 
  labs(x = "Year", y = "3 Year Moving Averge Annual % Change in GDP") + 
  axis_theme

gdp_time_change_avg

Smoothing this way, we see a consistent, year over year increase in the size of the manufacturing economy, growing slightly less than the economy as a whole until 2018, when the 3-year average growth in the manufacturing economy outpaces the national 3-year average growth in the economy.

gdp_time_change + gdp_time_change_avg
## Warning: Removed 1 row containing missing values (`geom_line()`).
## Removed 1 row containing missing values (`geom_line()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).

We save these national benchmarks for later reference and merging into cells.

nat_gdp_trends <- gdp_sum %>% 
  select(year, nat_gdp_all = state_gdp, nat_gdp_manf = gdp_manf, nat_manf_share = manf_share, nat_manf_change = manf_change, nat_change) 

State

We now create a function to calculate yearly state manufacturing and total economy GDP percent change

get_changes <- function(data) {
  data %>% 
    arrange(year, .by_group = TRUE) %>% 
  mutate(manf_change = gdp_manf/lag(gdp_manf) - 1, 
         change_3 = (manf_change + lag(manf_change, 1) + lag(manf_change, 2))/3) 
}
manf_gdp_change <- manf_gdp %>% 
  filter(GeoName != "United States") %>% 
  group_by(GeoName) %>% 
  get_changes() %>% 
  ungroup() %>% 
  select(-c(GeoName, GeoFips, LineCode, Description)) %>% 
  left_join(area_codes) %>% 
  left_join(states)
## Joining with `by = join_by(area_fips)`
## Joining with `by = join_by(area_title, state_abbr)`

We perform some further cleaning.

manf_change_clean <- state_gdp %>% 
  filter(GeoName != "United States") %>% 
  group_by(GeoName) %>% 
  rename(gdp_manf = state_gdp) %>% 
  get_changes() %>% 
  ungroup() %>% 
  select(area_fips, year, state_gdp = gdp_manf, state_change = manf_change, st_change_3 = change_3) %>% 
  left_join(manf_gdp_change) %>% 
  filter(!is.na(state_abbr)) 
## Joining with `by = join_by(area_fips, year)`

We can then use this dataframe to observe trends over time at the state level.

manf_change_clean %>% 
  ggplot() + 
  geom_line(aes(x = year, y = manf_change, group = state_abbr))
## Warning: Removed 50 rows containing missing values (`geom_line()`).

However, due to the high number of states in our sample, this approach does not reveal too many insightful trends.

Instead, let us consider looking at just the time period between 2010 and 2019. We will define a function that lets us calculate the percent change between any two years.

manf_state_calc <- function(data, min_year, max_year, sel_vars, mut_vars){ 
  
  data %>% 
    filter(year == min_year | year == max_year)%>%
    mutate(across(.cols = {{ mut_vars }}, ~ (as.numeric(.x)/lag(as.numeric(.x)) - 1 ))) %>% 
    filter(year == max_year) %>% 
    select( {{ sel_vars }} )

}
manf_2010_2019 <- manf_change_clean %>% 
  manf_state_calc(2010, 2019, sel_vars = c(state_abbr, gdp_manf, state_gdp), mut_vars = c(gdp_manf, state_gdp)) %>% 
  rename(mgc_2010_2019 = gdp_manf, sgc_2010_2019 = state_gdp)
nat_2010_2019 <- nat_gdp_trends %>% 
  manf_state_calc(2010, 2019, sel_vars = c(nat_gdp_manf, nat_gdp_all), mut_vars = c(nat_gdp_manf, nat_gdp_all)) %>% 
  rename(ngc_2010_2019 = nat_gdp_all, nmgc_2010_2019 = nat_gdp_manf)
manf_col <- brewer.pal(9, "Set3")[5]
total_col <- "#A5764D"
state_change <- manf_2010_2019 %>% 
  group_by(state_abbr) %>% 
  left_join(manf_change_clean %>% 
              filter(year == 2019) %>% 
              select(year, state_abbr, state_gdp)) %>% 
  mutate(manf_sign = mgc_2010_2019 > sgc_2010_2019 ) %>% 
  ggplot() + 
   geom_segment(aes(x = reorder(state_abbr, state_gdp), y = mgc_2010_2019, xend = reorder(state_abbr, state_gdp), yend = sgc_2010_2019, color = manf_sign), size = 2) +
  geom_point(aes(x = reorder(state_abbr, state_gdp), y = mgc_2010_2019), size = 4, shape = 21, color = "black", fill = manf_col) + 
  geom_point(aes(x = reorder(state_abbr, state_gdp), y = sgc_2010_2019), size = 3, shape = 21, color = "black", fill = total_col) + 
  geom_hline(yintercept = 0, color = "black") + 
  guides(fill = "none", color = "none") + 
  scale_color_manual(values = c(total_col, manf_col))+
  labs(x = "", y = "Percent Change in GDP, 2010 - 2019") + 
  coord_flip() + 
  theme_bw() + 
  axis_theme_blank
## Joining with `by = join_by(state_abbr)`
state_change + 
  geom_hline(yintercept = nat_2010_2019$nmgc_2010_2019[1], color = manf_col) + 
  geom_hline(yintercept = nat_2010_2019$ngc_2010_2019[1], color = total_col) 

We now turn to explore the manufacturing share, rather than the size.

manf_change_clean %>% 
  group_by(state_abbr) %>% 
  summarize(avg_change_manf = mean(manf_change, na.rm = TRUE), avg_change_all = mean(state_change, na.rm = TRUE)) %>% 
  ggplot() + 
  geom_point(aes(x = reorder(state_abbr, avg_change_manf), y = avg_change_manf)) + 
  geom_point(aes(x = reorder(state_abbr, avg_change_manf), y = avg_change_all), color = brewer.pal(9, "Set3")[3]) + 
  coord_flip()
manf_gdp_share_clean <- area_codes %>% 
  left_join(states) %>% 
  left_join(manf_gdp_share, .) %>% 
  filter(!is.na(state_abbr)) %>% 
  select(c(5:16), state_abbr, area_fips) %>% 
  pivot_longer(cols = -c(state_abbr, area_fips), names_to = "year", values_to = "manf_gdp_share") %>% 
  mutate(year = as.numeric(year))
## Joining with `by = join_by(area_title, state_abbr)`
## Joining with `by = join_by(area_fips)`
manf_gdp_all <- manf_gdp %>% 
  filter(GeoName != "United States") %>% 
  select(area_fips, year, gdp_manf) %>% 
  left_join(manf_gdp_share_clean, . ) 
## Joining with `by = join_by(area_fips, year)`

Geographic Concentration of Industry

We return to our QCEW data, and focus on the county level to understand the relative distribution of manufacturing strength in a state.

qcew_county <- qcew_data %>% 
  filter(str_detect(agglvl_title, "County, NAICS")) %>% 
  mutate(st = round(as.numeric(area_fips)/1000, 0)) 
qcew_county <- area_codes %>% 
  left_join(states) %>% 
  rename(st_fips = area_fips) %>% 
  left_join(qcew_county, .) %>% 
  group_by(year, state_abbr) %>% 
  mutate(county_count = n())
## Joining with `by = join_by(area_title, state_abbr)`
## Joining with `by = join_by(st)`

For the purposes of this part of the analysis, we will focus on 2019.

qcew_county %>% 
  filter(year == 2010 | year == 2019) %>% 
  view()
library(ggridges)
qcew_county %>% 
  filter(year == 2019, lq_emp != 0, !is.na(state_abbr)) %>%
  ggplot(aes(x = lq_emp, y = state_abbr, group = state_abbr, fill = state_abbr)) + 
  geom_density_ridges2()
## Picking joint bandwidth of 0.303

To understand the geographic distribution of manufacturing in a region, we can examine the mean of the employment location quotient, the standard deviation, as well as the max and the total count of counties with location quotient greater than 2. Specifically, we can look at the max location quotient, and the number of counties greater than 2. States with a high max location quotient but a low number of total counties with a LQ > 1.5 are considered to be geographically concentrated, while states with more counties with LQ > 1.5 are considered to be geographically distributed.

qcew_county %>% 
  saveRDS(here("State Data/county_employment.RDS"))

We can merge our qcew county level data with our existing manufacturing dataset.

emp_hhi_text <- qcew_county %>% 
  filter(year == 2019) %>% 
  select(county_fips = area_fips, year, county_emp = emp, state_abbr, county_lq_emp = lq_emp) %>% 
  left_join(
    manf_data %>% 
      filter(year == 2019, !is.na(state_abbr))
  ) %>% 
  mutate(county_emp_share = county_emp/emp, 
         county_emp_share_sq = county_emp_share**2) %>% 
  group_by(state_abbr) %>% 
  reframe(hhi_emp = sum(county_emp_share_sq, na.rm = TRUE), max_county_lq_emp = max(county_lq_emp)) %>% 
  filter(!is.na(state_abbr)) %>% 
  ggplot() + 
  geom_text(aes(hhi_emp, max_county_lq_emp, label = state_abbr)) + 
  theme_bw() + 
  labs(x = "State Manufacturing Employment by County HHI", y = "State Maximum County Manufacturing Location Quotient") + 
  axis_theme
## Joining with `by = join_by(year, state_abbr)`
emp_hhi_text

emp_hhi <- qcew_county %>% 
  filter(year == 2019) %>% 
  select(county_fips = area_fips, year, county_emp = emp, state_abbr, county_lq_emp = lq_emp) %>% 
  left_join(
    manf_data %>% 
      filter(year == 2019, !is.na(state_abbr))
  ) %>% 
  mutate(county_emp_share = county_emp/emp, 
         county_emp_share_sq = county_emp_share**2) %>% 
  group_by(state_abbr) %>% 
  reframe(hhi_emp = sum(county_emp_share_sq, na.rm = TRUE), max_county_lq_emp = max(county_lq_emp)) %>% 
  filter(!is.na(state_abbr)) %>% 
  ggplot() + 
  geom_point(aes(hhi_emp, max_county_lq_emp), shape = 21, size = 4, color = "black", fill = manf_col, alpha = 0.8) + 
  theme_bw() + 
  labs(x = "State Manufacturing Employment by County HHI", y = "State Maximum County Manufacturing Location Quotient") + 
  axis_theme
## Joining with `by = join_by(year, state_abbr)`
emp_hhi

geog_lq <- qcew_county %>% 
  filter(year == 2019) %>% 
  group_by(state_abbr) %>% 
  mutate(lq_1 = lq_emp > 1.5) %>% 
  summarise(mean_lq = mean(lq_emp), 
            std_lq = sd(lq_emp), 
            max_lq = max(lq_emp), 
            lq_2 = sum(lq_1), 
            counties = max(county_count))
geog_graph <- geog_lq %>% 
  ggplot() + 
  geom_point(aes(x = max_lq, y = lq_2/counties), size = 2) + 
  scale_y_continuous(labels = scales::percent) + 
  labs(x = "Maximum County Manufacturing Employment Location Quotient \n2019", y = "Percent of Counties in State w/ \nManufacturing Employment LQ > 1.5 \n2019") + 
  theme_bw() + 
  axis_theme

geog_graph

As a test, we can see how the mean location quotient, the max, and the number of high location quotients varies across our earlier defined 12 state categories. We see that there appear to be some trends, but nothing compelling.

geog_lq %>% 
  left_join(state_cats) %>% 
  group_by(group_id) %>% 
  summarise(mean_lq = mean(mean_lq), 
            max_lq = max(max_lq), 
            lq_2 = sum(lq_2)) %>% 
  view()
## Joining with `by = join_by(state_abbr)`

We can also start to define some categories for our states. Based on the above graph, we propose delineating states across two dimensions: absolute strength in manufacturing, and concentration of manufacturing. In this case, we define absolute strength in manufacturing as: the state has at least one county with a manufacturing employment location quotient of at least 5. We define concentration in manufacturing based on the percent of counties in a state with a manufacturing employment location quotient of at least 1.5: states with less than 40% of counties are concentrated, while states with more than 40% of counties are distributed.

state_cats_2 <- geog_lq %>% 
  mutate(geog_strength = max_lq >= 5.0, 
         geog_conc = (lq_2/counties) < .4) %>% 
  select(state_abbr, geog_strength, geog_conc) %>% 
  left_join(state_cats)
## Joining with `by = join_by(state_abbr)`

Industry Controls

For industry controls, we again use both employment and GDP data.

qcewGetAreaData <- function(year, qtr, area) {
    url <- "http://data.bls.gov/cew/data/api/YEAR/QTR/area/AREA.csv"
    url <- sub("YEAR", year, url, ignore.case=FALSE)
    url <- sub("QTR", tolower(qtr), url, ignore.case=FALSE)
    url <- sub("AREA", toupper(area), url, ignore.case=FALSE)
    read.csv(url, header = TRUE, sep = ",", quote="\"", dec=".", na.strings=" ", skip=0)
}
ind_concentration <- qcewGetAreaData(2019, "a", area_codes$area_fips[1]) %>% 
  filter(own_code == 5) %>% 
  left_join(agglvl_dict) %>% 
  filter(str_detect(agglvl_title, "NAICS 3")) %>% 
  mutate(industry_code = as.numeric(industry_code)) %>% 
  filter(industry_code >= 300 & industry_code <= 400) %>% 
  select(area_fips, lq_emp = lq_annual_avg_emplvl, industry_code, estabs = annual_avg_estabs, emp =  annual_avg_emplvl) 
## Joining with `by = join_by(agglvl_code)`
for (i in 2:(nrow(area_codes)-3)){
  ind_holder <- qcewGetAreaData(2019, "a", area_codes$area_fips[i]) %>% 
  filter(own_code == 5) %>% 
  left_join(agglvl_dict) %>% 
  filter(str_detect(agglvl_title, "NAICS 3")) %>% 
  mutate(industry_code = as.numeric(industry_code)) %>% 
  filter(industry_code >= 300 & industry_code <= 400) %>% 
  select(area_fips, lq_emp = lq_annual_avg_emplvl, industry_code, estabs = annual_avg_estabs, emp =  annual_avg_emplvl)
  
  ind_concentration <- bind_rows(ind_concentration, ind_holder)
}
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
ind_conc <- ind_concentration %>% 
  mutate(area_fips = as.character(area_fips), 
         area_fips = case_when(
    str_length(area_fips) < 5 ~ paste("0", area_fips, sep = ""), 
    TRUE ~ as.character(area_fips)
  ))
ind_conc <- area_codes %>% 
  left_join(states) %>% 
  left_join(ind_conc, .) %>% 
  group_by(state_abbr) %>% 
  mutate(ind_count = n())
## Joining with `by = join_by(area_title, state_abbr)`
## Joining with `by = join_by(area_fips)`
ind_lq <- ind_conc %>% 
  group_by(state_abbr) %>% 
  mutate(lq_1 = lq_emp >= 1.5) %>% 
  summarise(mean_lq = mean(lq_emp), 
            std_lq = sd(lq_emp), 
            max_lq = max(lq_emp), 
            lq_2 = sum(lq_1),
            industries = max(ind_count))
ind_graph <- ind_lq %>% 
  ggplot() + 
  geom_point(aes(x = max_lq, y = lq_2/industries), size = 2) +
  scale_y_continuous(labels = scales::percent) + 
  labs(x = "Maximum 3-Digit Manufacturing Employment Location Quotient \n2019", y = "Percent of 3-Digit Manufacturing NAICS in State w/ \nEmployment LQ > 1.5 \n2019") + 
  theme_bw() + 
  axis_theme

ind_graph

Industry Controls with GDP

We now turn to industry controls with GDP. We return to our BEA downloaded data.

First, for each 3-digit NAICS code, we want to compute the national share of GDP. This will then let us benchmark if a given state’s industry share of GDP is higher or lower than the national average.

ind_gdp_usa <- gdp_data %>% 
  filter(GeoName == "United States", LineCode > 12) %>% 
  mutate(across(.cols = -c(1:4), ~ as.numeric(.x))) %>% 
  pivot_longer(cols = -c(1:4), names_to = "year", values_to = "manf_ind_gdp") %>% 
  mutate(year = as.numeric(year)) %>% 
  left_join(gdp_nat %>% select(-c(LineCode, Description))) %>% 
  mutate(ind_share_usa = manf_ind_gdp/state_gdp) %>% 
  select(-c(GeoName, GeoFips))
## Joining with `by = join_by(GeoFips, GeoName, year)`

We then focus on the state level data, and compute each 3-digit manufacturing industry’s share of the state economy.

state_ind <- gdp_data %>% 
  filter(GeoName != "United States", LineCode > 12) %>% 
  mutate(across(.cols = -c(1:4), ~ as.numeric(.x))) %>% 
  pivot_longer(cols = -c(1:4), names_to = "year", values_to = "manf_ind_gdp") %>% 
  mutate(year = as.numeric(year)) %>% 
  left_join(state_gdp %>% 
              select(-c(LineCode, Description)) %>%
              mutate(year = as.numeric(year))) %>% 
  mutate(ind_share_state = manf_ind_gdp/state_gdp) %>% 
  left_join(ind_gdp_usa %>% 
              select(year, ind_share_usa, Description)) %>% 
  mutate(ind_lq = ind_share_state / ind_share_usa, 
         ind_strong = ind_share_state > ind_share_usa) %>%
  group_by(GeoName, year) %>% 
  mutate(ind_count = n())
## Warning: There were 2 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `across(.cols = -c(1:4), ~as.numeric(.x))`.
## Caused by warning:
## ! NAs introduced by coercion
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
## Joining with `by = join_by(GeoFips, GeoName, year)`
## Joining with `by = join_by(Description, year)`
state_ind %>% 
  saveRDS(here("State Data/state_ind.RDS"))
ind_hhi <- state_ind %>% 
  mutate(ind_share_state_sq = ind_share_state**2) %>% 
  group_by(area_fips) %>% 
  reframe(hhi_ind = sum(ind_share_state_sq, na.rm = TRUE), max_ind_lq = max(ind_lq)) %>% 
  left_join(area_codes) %>% 
  filter(!is.na(area_fips)) %>% 
  ggplot() + 
  geom_point(aes(hhi_ind, max_ind_lq), shape = 21, size = 4, color = "black", fill = manf_col, alpha = 0.8) + 
  geom_text(aes(hhi_ind, max_ind_lq, label = state_abbr)) + 
  theme_bw() + 
  labs(x = "State Manufacturing GDP by Industry HHI", y = "State Maximum Industry Manufacturing Location Quotient") + 
  axis_theme
## Joining with `by = join_by(area_fips)`
ind_lq_gdp <- state_ind %>% 
  group_by(GeoName, year) %>% 
  mutate(lq_1 = ind_lq >= 1.5) %>% 
  summarise(mean_lq_gdp = mean(ind_lq), 
            std_lq_gdp = sd(ind_lq), 
            max_lq_gdp = max(ind_lq), 
            lq_1_gdp = sum(lq_1),
            ind_strong = sum(ind_strong),
            industries = max(ind_count))
## `summarise()` has grouped output by 'GeoName'. You can override using the
## `.groups` argument.
ind_graph_gdp <- ind_lq_gdp %>% 
  filter(year == 2019) %>% 
  ggplot() + 
  geom_point(aes(x = max_lq_gdp, y = lq_1_gdp/industries), size = 2) +
  scale_y_continuous(labels = scales::percent) + 
  labs(x = "Maximum 3-Digit Manufacturing GDP Location Quotient \n2019", y = "Percent of 3-Digit Manufacturing NAICS in State w/ \nGDP LQ > 1.5 \n2019") + 
  theme_bw() + 
  axis_theme

ind_graph_gdp

We can do some basic analysis about trends over time to see how the mean and max 3-digit manufacturing GDP location quotients change over time.

ind_lq_gdp %>% 
  ggplot() +
  geom_line(aes(x = year, y = mean_lq_gdp, group = GeoName, color = GeoName))

ind_lq_gdp %>% 
  ggplot() +
  geom_line(aes(x = year, y = max_lq_gdp, group = GeoName, color = GeoName))

And we can also examine the correlation between the manufacturing GDP and employment location quotients.

mean_lq_comp <- ind_lq_gdp %>%
  filter(year == 2019) %>% 
  rename(area_title = GeoName) %>% 
  left_join(states) %>% 
  select(-c(industries)) %>% 
  left_join(ind_lq)  %>% 
  ggplot() + 
  geom_point(aes(x = mean_lq, y = mean_lq_gdp), size = 2) +
  labs(x = "Mean 3-Digit Manufacturing Employment Location Quotient \n2019", y = "Mean 3-digit Manufacturing GDP Location Quotient \n2019") + 
  theme_bw() + 
  axis_theme
## Joining with `by = join_by(area_title)`
## Joining with `by = join_by(state_abbr)`
mean_lq_comp

max_lq_comp <- ind_lq_gdp %>%
  filter(year == 2019) %>% 
  rename(area_title = GeoName) %>% 
  left_join(states) %>% 
  select(-c(industries)) %>% 
  left_join(ind_lq)  %>% 
  ggplot() + 
  geom_point(aes(x = max_lq, y = max_lq_gdp), size = 2) +
  labs(x = "Max 3-Digit Manufacturing Employment Location Quotient \n2019", y = "Max 3-digit Manufacturing GDP Location Quotient \n2019") + 
  theme_bw() + 
  axis_theme
## Joining with `by = join_by(area_title)`
## Joining with `by = join_by(state_abbr)`
max_lq_comp

lq_comp <- mean_lq_comp + max_lq_comp

We see that 3-digit manufacturing GDP and employment location quotients appear to be quite correlated. Because of missing data issues, we focus our categorization of states on the 3-digit manufacturing GDP location quotient.

As with geography, we propose delineating states across two dimensions: absolute strength in manufacturing, and concentration of manufacturing. In this case, we define absolute strength in manufacturing as: the state has at least 3-digit manufacturing industry with a GDP location quotient of at least 4. We define concentration in manufacturing based on the percent of 3-digit manufacturing industries in a state with a manufacturing GDP location quotient of at least 1.5: states with less than 35% of industries are concentrated, while states with more than 35% of industries are distributed. Note that the numbers we use for the industry categorization are slightly different than the numbers we use for the geographic categorization.

state_cats_3 <- ind_lq_gdp %>%
  filter(year == 2019) %>% 
  rename(area_title = GeoName) %>% 
  left_join(states) %>% 
  filter(!is.na(state_abbr)) %>% 
  mutate(ind_strength = max_lq_gdp >= 4.0, 
         ind_conc = (lq_1_gdp/industries) > .35) %>% 
  select(state_abbr, ind_strength, ind_conc) %>% 
  left_join(state_cats_2)
## Joining with `by = join_by(area_title)`
## Adding missing grouping variables: `area_title`
## Joining with `by = join_by(state_abbr)`

We can now explore how geographic and industry concentrations and strength relate to one another, as well as how they relate to the overall strength of the manufacturing economy in a state.

state_cats_3 %>% 
  group_by(ind_strength, ind_conc, geog_strength, geog_conc) %>% 
  count()
## # A tibble: 9 × 5
## # Groups:   ind_strength, ind_conc, geog_strength, geog_conc [9]
##   ind_strength ind_conc geog_strength geog_conc     n
##   <lgl>        <lgl>    <lgl>         <lgl>     <int>
## 1 FALSE        FALSE    FALSE         TRUE         22
## 2 FALSE        FALSE    TRUE          FALSE         1
## 3 FALSE        FALSE    TRUE          TRUE          5
## 4 FALSE        TRUE     TRUE          FALSE         3
## 5 TRUE         FALSE    FALSE         FALSE         1
## 6 TRUE         FALSE    FALSE         TRUE          7
## 7 TRUE         FALSE    TRUE          TRUE          1
## 8 TRUE         TRUE     FALSE         FALSE         2
## 9 TRUE         TRUE     TRUE          FALSE         8

We see that 22 states have low industry and geography strength, have distributed industry, and are geographically concentrated. (Arizona, California, Colorado, Delaware, Florida, Hawaii, Maryland, Massachusetts, Montana, Nebraska, New Hampshire, New Jersey, New Mexico, New York, North Dakota, Oklahoma, Rhode Island, South Dakota, Texas, Utah, Vermont, Virginia)

5 states have low industry strength and distributed, but have geographic strength and are concentrated. (Alaska, Illinois, Missouri, Nevada, and West Virginia)

1 state has low industry strength, sees industry distribution, but is geographically strong and concentrated (Pennsylvania)

3 states have low industry strength, see industry concentration, and are geographically strong and distributed (Minnesota, Ohio, Tennessee)

1 state has high industry strength and is distributed, but is geographically weak and distributed. (Connecticut)

7 states have high industry strength and distributed industry, but are geographically weak and concentrated. (Idaho, Kansas, Louisiana, Maine, Oregon, Washington, Wyoming)

1 state has high industry strength and distributed industry, and is geographically strong and concentrated. (Georgia)

2 states have high industry strength and distributed industry, and are geographically weak and distributed. (Michigan, Wisconsin)

8 states have high industry strength and distributed industry, and are geographically strong but concentrated. (Alabama, Arkansas, Indiana, Iowa, Kentucky, Mississippi, North Carolina, South Carolina)

state_cats_3 %>% 
  group_by(group_id) %>% 
  count(ind_strength, ind_conc, geog_conc, geog_strength)
## # A tibble: 32 × 6
## # Groups:   group_id [12]
##    group_id ind_strength ind_conc geog_conc geog_strength     n
##       <int> <lgl>        <lgl>    <lgl>     <lgl>         <int>
##  1        1 FALSE        FALSE    TRUE      FALSE             9
##  2        1 FALSE        FALSE    TRUE      TRUE              1
##  3        1 TRUE         FALSE    TRUE      FALSE             1
##  4        2 FALSE        FALSE    TRUE      FALSE             4
##  5        2 FALSE        FALSE    TRUE      TRUE              1
##  6        3 FALSE        FALSE    TRUE      FALSE             1
##  7        3 FALSE        FALSE    TRUE      TRUE              1
##  8        3 TRUE         FALSE    TRUE      FALSE             1
##  9        4 FALSE        FALSE    TRUE      FALSE             1
## 10        4 TRUE         FALSE    TRUE      FALSE             1
## # ℹ 22 more rows

State Categorizations

We now seek to combine the higher level manufacturing trends with the manufacturing and geographic composition categories.

We can simplify the 12 categories of state variation into five:

  1. Declining Manufacturing Economy:

States: Colorado, Maryland, Massachusetts, New Jersey, New Mexico, New York, North Dakota, Virginia, Washington, West Virginia

Description: These states have below-average manufacturing shares of employment and GDP in 2019, with both measures declining between 2010 and 2019.

  1. High-Output, Low Employment Manufacturing Economy

States: Alaska, Delaware, Florida, Rhode Island, Arizona

Description: These states have below-average manufacturing shares of employment and GDP in 2019. While the manufacturing share of employment decreased between 2010 and 2019, the manufacturing share of GDP increased.

  1. Emerging Manufacturing Economy:

States: Montana, Nevada, Wyoming

Description: These states have below-average manufacturing shares of employment and GDP in 2019. However, both their manufacturing share of employment and GDP increased between 2010 and 2019.

  1. Mixed Manufacturing Economy:

States: California, Louisiana, Texas Description: These states have below-average manufacturing shares of employment in 2019 but above-average manufacturing shares of GDP. Between 2010 and 2019, their manufacturing shares of employment declined, while the manufacturing shares of GDP increased (California) or declined (Louisiana, Texas).

  1. Strong Manufacturing Base

States: Alabama, Kentucky, Michigan Description: These states have above-average manufacturing shares of employment and GDP in 2019. Furthermore, both their manufacturing shares of employment and GDP grew between 2010 and 2019. They exhibit a combination of distributed manufacturing composition and geographic concentration.